SUPPORT / SAMPLES & SAS NOTES
 

Support

Problem Note 41139: Incorrect blocking and symbols with BOX= input data set

DetailsAboutRate It

If a BOX= input data set to PROC BOXPLOT includes one or more block variables, the blocks will be incorrectly displayed in the plot. There will be a misalignment of the blocks and the box-and-whisker plots. For example, the last box in each block may be displayed as the first box in the following block. If a SYMBOL variable is specified with associated SYMBOLn statements, then the plotted symbols will also be misaligned. The first group variable level uses the SYBMOL2 instead of the SYMBOL1 statement, the second group variable level uses the SYMBOL3 statement, etc. The incorrectly shifted symbols will also be displayed if the default symbols are used.

The HISTORY= data set does not have any of these problems and can be used as a workaround.

Example

The following statements create a SAS data set containing diameter measurements for a part produced on three different machines:

data Parts;
  length Machine $ 4;
  label Sample  = 'Sample Number'
        Machine = 'Machine';
  input Sample $ Machine $ @;
  do i= 1 to 4;
    input Diameter @;
      output;
    end;
  drop i;
  datalines;
1  A386  4.32 4.55 4.16 4.44
2  A386  4.49 4.30 4.52 4.61
1  A455  4.45 4.56 4.38 4.51
2  A455  4.62 4.67 4.70 4.58
1  C334  4.16 4.28 4.31 4.59
2  C334  4.14 4.18 4.08 4.21
;

These statements create a box plot for the measurements in the Parts data set grouped into blocks by the block variable Machine:

symbol1 c=red v=dot;
symbol2 c=blue v=star;
title 'Box Plot for Diameter Grouped By Machine';
proc boxplot data=Parts;
    plot Diameter*Sample (Machine)=Sample / blockpos=3;
run;

The blocks and symbols are correct here, with two Samples per Machine:

DATA= input data set

The following statements create a BOX= SAS data set of the same PARTS data for input to PROC BOXPLOT. This is a form of pre-summarized data containing one box-and-whisker plot summary statistic or outlier value per observation.

data BoxData;
  length _type_ $8;
  _var_='Diameter';
  do Machine='A386','A455','C334';
   do sample='1','2';
    do _type_='N','MIN','Q1','MEAN','MEDIAN','Q3','MAX','STDDEV';
      input _value_ @@;
        output;
    end;
   end;
  end;
  datalines;
4     4.16    4.240    4.3675    4.380    4.495    4.55    0.16721
4     4.30    4.395    4.4800    4.505    4.565    4.61    0.13038
4     4.38    4.415    4.4750    4.480    4.535    4.56    0.07767
4     4.58    4.600    4.6425    4.645    4.685    4.70    0.05315
4     4.16    4.220    4.3350    4.295    4.450    4.59    0.18193
4     4.08    4.110    4.1525    4.160    4.195    4.21    0.05620
;

Here is code to create the box plot using the pre-summarized BOX= data set:

title 'BOX= input data set';
proc boxplot box=BoxData;
    plot Diameter*Sample (Machine)=Sample / blockpos=3;
run;

The plot should be the same as the one above, but the blocks and symbols are misaligned:

BOX= input data set

If only the pre-summarized data are available, the correct plot may be obtained by creating a HISTORY= data set instead of a BOX= data set. This type of input data set is structured to have one observation per individual box-and-whisker plot:

data HistData;
  do Machine='A386','A455','C334';
    do sample='1','2';
      input DiameterN DiameterL Diameter1 DiameterX DiameterM Diameter3
         DiameterH DiameterS;
      output;
    end;
  end;
  datalines;
4     4.16    4.240    4.3675    4.380    4.495    4.55    0.16721
4     4.30    4.395    4.4800    4.505    4.565    4.61    0.13038
4     4.38    4.415    4.4750    4.480    4.535    4.56    0.07767
4     4.58    4.600    4.6425    4.645    4.685    4.70    0.05315
4     4.16    4.220    4.3350    4.295    4.450    4.59    0.18193
4     4.08    4.110    4.1525    4.160    4.195    4.21    0.05620
;
title 'HISTORY= input data set';
proc boxplot history=HistData;
    plot Diameter*Sample (Machine)=Sample / blockpos=3;
run;

Now the box plot blocks and symbols are correct:

HISTORY= input data set


Operating System and Release Information

Product FamilyProductSystemSAS Release
ReportedFixed*
SAS SystemSAS/STATz/OS9.1 TS1M09.3 TS1M0
Microsoft® Windows® for 64-Bit Itanium-based Systems9.1 TS1M09.3 TS1M0
Microsoft Windows Server 2003 Datacenter 64-bit Edition9.1 TS1M09.3 TS1M0
Microsoft Windows Server 2003 Enterprise 64-bit Edition9.1 TS1M09.3 TS1M0
Microsoft Windows 2000 Advanced Server9.1 TS1M0
Microsoft Windows 2000 Datacenter Server9.1 TS1M0
Microsoft Windows 2000 Server9.1 TS1M0
Microsoft Windows 2000 Professional9.1 TS1M0
Microsoft Windows NT Workstation9.1 TS1M0
Microsoft Windows Server 2003 Datacenter Edition9.1 TS1M09.3 TS1M0
Microsoft Windows Server 2003 Enterprise Edition9.1 TS1M09.3 TS1M0
Microsoft Windows Server 2003 Standard Edition9.1 TS1M09.3 TS1M0
Microsoft Windows XP Professional9.1 TS1M09.3 TS1M0
64-bit Enabled AIX9.1 TS1M09.3 TS1M0
64-bit Enabled HP-UX9.1 TS1M09.3 TS1M0
64-bit Enabled Solaris9.1 TS1M09.3 TS1M0
HP-UX IPF9.1 TS1M09.3 TS1M0
Linux9.1 TS1M09.3 TS1M0
OpenVMS Alpha9.1 TS1M09.3 TS1M0
Tru64 UNIX9.1 TS1M09.3 TS1M0
* For software releases that are not yet generally available, the Fixed Release is the software release in which the problem is planned to be fixed.